A New Sentence Compression Dataset and Its Use in an Abstractive Generate-and-Rank Sentence Compressor

نویسندگان

  • Dimitrios Galanis
  • Ion Androutsopoulos
چکیده

Sentence compression has attracted much interest in recent years, but most sentence compressors are extractive, i.e., they only delete words. There is a lack of appropriate datasets to train and evaluate abstractive sentence compressors, i.e., methods that apart from deleting words can also rephrase expressions. We present a new dataset that contains candidate extractive and abstractive compressions of source sentences. The candidate compressions are annotated with human judgements for grammaticality and meaning preservation. We discuss how the dataset was created, and how it can be used in generate-and-rank abstractive sentence compressors. We also report experimental results with a novel abstractive sentence compressor that uses the dataset.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

From Extractive to Abstractive Meeting Summaries: Can It Be Done by Sentence Compression?

Most previous studies on meeting summarization have focused on extractive summarization. In this paper, we investigate if we can apply sentence compression to extractive summaries to generate abstractive summaries. We use different compression algorithms, including integer linear programming with an additional step of filler phrase detection, a noisychannel approach using Markovization formulat...

متن کامل

Supervised Sentence Fusion with Single-Stage Inference

Sentence fusion—the merging of sentences containing similar information— has been shown to be useful in an abstractive summarization context. We present a new dataset of sentence fusion instances obtained from evaluation datasets in summarization shared tasks and use this dataset to explore supervised approaches to sentence fusion. Our proposed inference approach recovers the highest scoring ou...

متن کامل

Generate Compressed Sentences with Stanford Typed Dependencies towards Abstractive Summarization

In this paper, we implement sentence generation process towards generate abstractive summarization which is proposed by (Genest and Lapalme, 2010). We simply use Stanford Typed Dependencies1 to extract information items and generate multiple compressed sentences via Natural Language Generation engine. Then we follow LexRank based sentence ranking combined with greedy sentence selection to build...

متن کامل

Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination

Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...

متن کامل

A Dataset and Evaluation Metrics for Abstractive Compression of Sentences and Short Paragraphs

We introduce a manually-created, multireference dataset for abstractive sentence and short paragraph compression. First, we examine the impact of singleand multi-sentence level editing operations on human compression quality as found in this corpus. We observe that substitution and rephrasing operations are more meaning preserving than other operations, and that compressing in context improves ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011